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ABSTRACT 


The  adjacency  map  is  a  data  structure  (a  tree)  used  to  solve  the 
following  problem:  given  a  set  of  parallel  segments  in  the  plane  and  a 
point  p,  find  the  segments  closest  to  p  among  those  intersected  by  the 
straight  line  through  p,  perpendicular  to  the  common  direction  of  the 
segments.  The  search  is  performed  in  the  repetitive  mode,  so  that 
preprocessing  is  convenient. 

The  problem  considered  is  a  particular  case  of  planar  point  location 
for  which  algorithms  are  known  (Lipton-Tar jan,  Kirkpatrick),  which  make  use 
of  data  structures  constructed  in  time  O(nlogn),  searched  in  time  O(logn), 
and  stored  in  space  0(n).  Though  asymptotically  optimal,  the  previous 
algorithms  are  not  very  practical.  More  practical  algorithms  have  been 
proposed  (Preparata,  Prepara ta-Lipski) ,  which  use  O(nlogn)  space. 

In  this  thesis  a  modification  of  these  algorithms  is  presented  for 
the  adjacency  map,  and  the  worst  case  analysis  is  performed. 

The  technique  is  easily  extensible  to  general  planar  graphs.  It  is 
conjectured  that,  under  reasonable  assumptions  on  the  input  distribution,  the 
new  algorithm  takes  expected  linear  storage. 

A  probabilistic  analysis  substantiates  such  conjecture  in  the  case  of 
the  adjacency  map,  for  a  wide  class  of  input  distributions.  In  particular, 
when  the  segments  have  independent  endpoints,  it  is  shown  that  the  nunfcer  of 
nodes  in  the  corresponding  adjacency  map  is  about  6  times  the  number  of 
segments.  The  results  of  the  analysis  have  been  confirmed  by  simulation. 


1.  INTRODUCTION 


The  adjacency  map  (AM),  a  data  structure  suitable  for  searching  a  set 
of  parallel  segments  in  the  plane,  has  been  shown  to  be  efficiently 
applicable  to  the  solution  of  several  problems  of  planar  computational 
geometry  [1].  The  AM  solves  the  following  problem:  given  a  set  Cl  of 
parallel  segments  in  the  plane  and  a  point  p,  find  the  segments  closest  to  p 
among  those  Intersected  by  the  straight  line  passing  through  p  and  perpen¬ 
dicular  to  the  common  direction  of  the  segments.  In  the  sequel  we  assume 
the  y  axis  of  the  Cartesian  plane  to  be  parallel  to  the  segments.  If  through 
each  endpoint  w  of  each  member  of  A  we  trace  a  horizontal  half-line  to  the 
right  and  one  to  the  left  and  we  make  such  lines  terminate  when  they  meet 
another  segment  or  continue  to  the  infinity  otherwise,  we  partition  the  plane 
into  regions.  Two  of  these  regions  are  half  planes  and  all  the  other  are 
rectangles,  possibly  unbounded  in  one  or  both  horizontal  directions.  Each 
rectangle  is  an  equivalence  class  of  points  of  the  plane  with  respect  to 
their  horizontal  adjacency  to  vertical  segments,  and  can  be  called  adjacency 
region. 

The  foregoing  problem  is  a  special  case  of  point  location  in  a  planar 
subdivision  and  therefore  can  be  handled  by  one  of  the  general  techniques  for 
such  problems  [2, 3, 4, 5]. 

By  using  a  method  recently  developed  in  [3],  Lipski  and  Preparata  [1] 
have  given  an  efficient  algorithm  for  point  location  in  the  adjacency  regions 
They  make  use  of  a  static  data  structure  that  they  call  adjacency  map,  which 
can  be  searched  in  time  O(logn),  constructed  in  time  O(nlogn),  and  stored 
in  space  O(nlogn),  where  n  -  \d\  is  the  number  of  segments. 


The  reason  to  further  consider  the  problem  stems  from  a  theoretical 
result  due  to  Lipton  and  Tarjan  [4],  and  Kirkpatrick  [5]  who  showed  that 
O(logn)  searching  time  and  O(nlogn)  preprocessing  time  can  be  achieved  with 
a  data  structure  which  only  uses  0(n)  space.  Even  though  the  Lipton-Tar jan 
method  is  algorithmically  extremely  complicated  (no  conclusion  to  the 
contrary  is  available  for  Kirkpatrick's  method) ,  it  suggests  that  a  practical 
algorithm  may  exist  with  the  same  performance. 

In  this  thesis  we  consider  a  new  algorithm  for  the  AM,  which  is  a 
conceptually  simple  modification  of  the  algorithm  presented  in  [1].  While 
the  worst  case  asymptotic  performance  is  the  same  as  in  [1],  the  average 
performance  is  improved .  We  show  that  if  the  segments  are  independent  of 
each  other,  under  some  very  weak  assumptions,  expected  linear  storage  is 
achieved,  and  also  that  except  for  a  presorting  operation,  the  algorithm  runs 
in  expected  linear  time.  We  also  make  use  of  a  procedure  different  from  the 
one  used  in  [1,2]  to  balance  the  search  tree.  The  new  procedure  is  simpler 
and  faster.  Due  to  the  nature  of  the  procedure  for  building  the  search  tree, 
the  bound  to  the  depth  of  the  tree  is  also  reduced  from  P5  lognl  to 
3  logn  +  7. 

The  thesis  is  organized  as  follows.  In  Section  2  the  search  tree  is 
defined  together  with  a  recursive  procedure  for  constructing  it,  and  Section 
3  describes  the  balancing  of  the  search  tree.  The  worst  case  asymptotic 
analysis  is  performed  in  Section  4.  Section  5  is  devoted  to  the  probabilistic 
analysis  and  is  the  main  contribution  of  this  thesis.  We  begin  with  the 
definition  of  a  suitable  random  model  for  the  input  of  the  algorithm.  In 
this  model  a  segment  is  defined  by  two  random  variables  U  and  V,  called 
generators,  which  are  the  (unordered)  segment  endpoints.  A  segment  is 


statistically  described  by  the  joint  distribution  o£  its  generators.  We 
also  assume  that  different  segments  are  statistically  independent  from  each 
other.  We  show  that,  under  this  assumption,  the  statistical  properties  of 
the  algorithm  are  independent  of  the  first  order  generator  distribution  and 
they  are  only  affected  by  the  correlation  properties  of  the  endpoints. 
Therefore  there  is  no  loss  of  generality  in  considering  uniform  generators. 
The  analysis  we  carry  out  shows  that,  for  a  broad  class  of  generator 
joint-distributions,  the  average  number  of  nodes  in  the  search  tree  is 
linear  in  the  number  of  segments.  Numerical  results  are  obtained  for  the 
particular  case  of  independent  endpoints.  For  this  case  we  get  a  theoretical 
upper-bound  of  6.07  n  for  the  number  S  of  nodes  in  the  tree.  Simulation 
completely  agrees  with  this  upper-bound  and  gives  an  average  number  of  nodes 
close  to  5.7  n.  These  results  are  very  satisfactory,  since  it  can  be  shown 
that  the  search  tree  mist  contain  at  least  3n  nodes,  and  therefore  the 
algorithm  performs  on  the  average,  within  a  factor  2  of  the  absolute  lower 
bound.  Moreover  the  average  result  for  this  algorithm  is  particularly 
meaningful  since  -  due  to  the  law  of  large-numbers  -  the  actual  values 
obtained  for  inputs  of  reasonably  large  size  (say  over  300)  are  very  close 
to  the  expected  values. 

The  AM  solves  several  problems  related  to  collections  of  parallel 
segments  in  the  plane.  Some  of  these  problems  are  the  Nontrivial-Contour . 
the  External -Contour .  the  Point-Location,  and  the  Route-in-a-Maze ,  which 
are  defined  and  discussed  in  [1].  These  and  probably  several  other 
problems  of  similar  type  arise  in  diverse  fields  of  application  such  as 
computer  aided  design,  large-scale-integration,  operation  research,  data 
base  concurrency  control,  etc.  Of  course  all  of  the  foregoing  problems  will 
take  advantage  of  an  efficient  algorithm  for  the  AM. 


However  we  Chink  chat  the  implications  of  the  present  study  may  b. 
wider.  The  algorithm  that  we  present  is  in  fact  immediately  extensible 
to  the  problem  of  point  location  in  a  general  straight-line  planar 
subdivision  and  intuition  suggests  that  its  performance  will  be  satisfactory 
in  the  general  case.  But  a  rigorous  proof  of  such  a  statement  is  not 
straightforward  and  we  hope  to  perform  such  a  task  in  future  work. 


2.  DEFINITION  AND  CONSTRUCTION  OF  THE  SEARCH  TREE 

We  recall  here  from  [11  the  definition  of  the  AM.  The  AM  is  a  binary 
tree  1C  with  two  types  of  nodes  having  a  different  typographical  representation: 
"V",  a  v-node  or  "horizontal  node",  is  associated  with  a  horizontal  line  y*c 
and  has  the  ordinate  c  as  a  discriminator;  "0",  an  0-node  or  segment  node, 
is  associated  with  a  straight  line  segment  on  the  line  x  ■  d  and  has  the 
abscissa  d  as  discriminator. 

To  search  for  a  given  point  p  -  (x,y)  corresponds  to  tracing  a  path  in  1C 
from  the  root  to  a  leaf.  The  discriminator  is  compared  against  y  for  7-nodes 
and  against  x  for  0-nodes.  The  path  makes  a  turn  to  the  left  when  the  point 
coordinate  is  smaller  than  the  discriminator  and  to  the  right  otherwise. 

Along  the  path  the  largest  abscissa  smaller  than  x,  and  die  smallest  abscissa 
larger  than  x,  initialized  to  -•  and  respectively,  are  recorded  and  give, 
when  a  leaf  is  reached,  the  left  and  the  right  adjacent  segments,  with  an 
infinite  abscissa  corresponding  to  the  case  of  no  adjacent  segment. 

Several  different  search  trees  are  possible  for  the  same  set  of  segments. 
We  are  particularly  interested  in  bounding  the  depth  of  the  tree  (to  bound  the 
query  time)  and  the  total  number  of  nodes  (to  bound  the  storage) .  The  latter 
objective  is  the  minimization  of  the  number  of  nodes  in  the  tree  having  the 
same  horizontal  or  vertical  discriminator.  The  former  objective  requires 
instead  a  suitable  balancing  strategy  when  constructing  the  tree. 

The  complete  definition  of  the  AM  is  given  by  specif ying  the  algorithm 
that  constructs  it.  To  describe  both  the  Lipski -Prepara ta  and  the  present 
algorithm  let  us  introduce  some  definitions.  A  slab  [B.Tl,  with  B  <  T,  is 
the  strip  of  plane  contained  between  the  lines  y  *  B  and  Y  ■  T.  A  vertical 


segment  s  -  (XjYj^Yj)  spans  slab  [B,T]  if  Y^  ^  B,  T  iS  Y2-  Given  a  slab 
[B,T]  and  a  sequence  Q  of  vertical  segments  sorted  according  to  increasing 
abscissa  and  having  a  nontrivial  intersection  with  the  slab,  the  spanning 
segments  of  Q  partition  the  slabs  into  a  set  of  regions,  which  we  technically 
call  rectangles  (these  regions  are  actual  rectangles,  possibly  unbounded  on 
one  or  both  sides). 

The  philosophy  of  the  AM  is  the  following.  Once  a  point  has  been  located 
within  a  slab,  comparisons  against  the  spanning  segments  of  the  slab  locate 
the  point  in  a  rectangle.  The  rectangle  is  then  sliced  in  two  slabs  by  some 
line  y  *  M,  with  B  <  M  <  T;  a  comparison  against  M  locates  the  point  in  one 
of  these  slabs  and  from  there  on  the  search  proceeds  recursively  in  the  same 
fashion.  The  search  terminates  when  the  rectangle  examined  is  empty. 

The  foregoing  ideas  naturally  suggest  a  recursive  procedure  for  building 
the  tree  in  which  two  main  steps  remain  to  be  specified:  (i)  how  to  choose  M 
in  a  given  rectangle  (choice  of  the  cut  point) ;  (ii)  how  to  organize  in  a 
binary  tree  the  O-nodes  corresponding  to  a  segment  spanning  the  slab  and 
the  7-nodes  corresponding  to  cut  points  in  the  rectangles.  We  shall  describe 
step  (1)  in  this  section  and  step  (ii)  in  the  next  one. 

The  choice  of  the  cut  point  constitutes  the  main  difference  between  the 
old  and  the  new  versions  of  the  AM.  In  [1]  the  cut  point  is  M  *  L(B+T)/2J 
for  each  rectangle  in  the  slab  [B,T].  We  propose  instead  to  select  M  as 
the  median  of  the  ordinates  of  all  the  segment  endpoints  following  in  the 
rectangle.  This  choice  is  aimed  at  decreasing  the  numbers  of  both  vertical 
and  horizontal  nodes.  As  an  example,  consider  a  rectangle  in  the  slab 
[n/2,n]  containing  only  one  segment  endpoint,  say  n/2  +  1,  the  other  endpoint 
being  in  slab  [n,2n].  The  situation  is  illustrated  in  Figure  2.1  for  n*16. 


(a)  new  algorithm 


(b)  old  algorithm 


Figure  2.1.  Comparison  of  different  criteria  for  the  choice  of  the 
cut  point. 

We  can  seet  referring  to  Figure  2.1  (a),  that  the  new  algorithm  constructs 
a  tree  with  only  one  vertical  node  (corresponding  to  ordinate  9)  and  one 
horizontal  node  (corresponding  to  the  segment  abscissa).  In  Figure  2.1  (b) 
we  see  that  the  algorithm  performs  three  cuts  (at  ordinates  12t  10,  and  9); 
correspondingly  3  vertical  nodes  and  3  horizontal  nodes  (with  the  segment 
abscissa)  will  be  allocated  in  the  tree.  For  a  generic  value  of  n  the  new 
algorithm  still  uses  one  vertical  and  one  horizontal  node  for  the  search  tree 
of  the  given  rectangle,  while  in  the  same  case,  the  old  algorithm  uses 
0(logn)  nodes  of  both  types. 


We  are  now  ready  to  more  formally  define  a  procedure  SEARCHTREE 
which  recursively  constructs  the  AM.  The  inputs  are  a  slab  [B,T]  and  a 
queue  Q  of  segments,  sorted  from  left  to  right,  and  Intersecting  the  slab. 

The  output  is  the  corresponding  search  tree.  We  also  assume  that  the 
endpoints  have  been  presorted  and  their  ordinates  normalized  to  the  set 

{l,2 . 2n],  where  n  is  the  number  of  segments.  Calling  Qq  the  queue  the 

queue  of  all  the  input  segments,  the  AM  is  built  by  the  call 

3C  -  SEARCHTREE  (l,2n;QQ)  .  (2.1) 

The  generic  call  decomposes  the  queue  Q  into  strings  o's  of  consecutive 
spanning  segments  and  y' s  of  consecutive  nonspanning  segments: 

Q  "Vl  °r'*Yrffr  •  (2,2) 

In  (2.2),  Oq  and  may  be  empty,  while  all  the  other  strips  are  nonempty. 

A  string  y^  corresponds  to  a  nonempty  rectangle  and,  after  confuting  the  cut 
point  M^  (see  Figure  2.2  (a)),  two  queues  and  Qi2  are  formed  with  the 

segments  of  the  1-th  rectangle  that  intersect  slab  [B,M^]  and  slab  [M^,T] 
respectively.  A  7-node  with  discriminator  M^  is  created  such  that 

LEFTSON  (Vt)  »  root  (SEARCHTREE  (B.M^Q^))  ;  (2.3a) 

RIGHTSON  (VA)  -  root  (SEARCHTREE  (Mi,T;Q12))  .  (2.3b) 

An  O-node  is  created  for  each  spanning  segment,  with  discriminator  equal 
to  the  abscissa  of  the  segments.  The  nodes  are  stored  in  a  queue 


(see  Figure  2.2  (b)).  Once  Che  queue  U  is  completed,  it  is  restructured 
into  a  balanced  tree  U  by  the  procedure  BALANCE  to  be  described  in  the 
next  section  (see  Figure  2.2  (c)). 

A  pidgin  algol  program  for  procedure  SEARCHTREE  is  given  in  Table  1. 

Q,  Q  ,  Q2,  U,  Z  are  queues;  s  «  Q  and  Q  *  s  denote  the  operation 

FOP  s  from  Q  and  PUSH  s  into  Q  respectively.  Queue  Z  is  used  to  temporarily 
store  the  segments  contained  in  a  rectangle.  The  subroutine  OUTPOINT 
computes  the  ordinate  M  at  which  we  slice  the  rectangle.  When  M  is  known 
the  queues  and  Q2  of  the  segments  that  intersect  the  lower  and  upper 
slabs,  in  which  the  rectangle  is  divided,  can  be  formed.  We  notice  that 
in  the  Lipski-Preparata  algorithm,  M  -  L(B  +  T)/2J  does  not  depend  on 
which  segments  are  in  the  rectangle  and  therefore  queues  and  Q2  may  be 
directly  formed  without  using  queue  Z  (compare  with  [3],[1]).  This  is  not 
the  case  for  our  version  of  the  algorithm,  in  which  OUTPOINT  computes  a 
median  of  the  endpoints  within  the  rectangles. 

To  find  the  median  of  the  points  inside  a  given  rectangle  a  standard 

median  algorithm  [6]  can  be  used,  which  will  work  in  time  linear  in  the 

number  of  points.  Another  solution,  probably  faster  and  easier  to  implement 
can  be  obtained  by  a  simple  modification  of  the  procedure  SEARCHTREE.  The 
idea  is  to  maintain  together  with  the  queue  Q  of  the  segments,  a  list  P  of 
their  endpoints  contained  in  the  slab,  sorted  by  increasing  ordinate. 

Each  segment  of  Q  has  pointers  to  its  endpoints,  when  they  are  in  P;  the 
pointers  are  null  otherwise.  In  a  first  scan  of  Q,  rectangles  are  formed 
and  points  of  P  are  marked  with  the  name  of  the  rectangle  to  which  they 
belong.  In  a  second  scan,  a  list  P,  of  sorted  points  is  easily  built  for 


Table  1.  A  pidgin  algol  program  £or  procedure  SEARCHTREE. 


procedure  SEARCHTREE  (B,T;Q) 

1  begin  Z,  Q^,  Q2>  U  *  0 

2  while  Q  +  0  do 

3  begin  s  -  (X;Y1»Y2)  •  Q 

4  if  (B  <  Y^)  or  (Y2  <  T)  (*  s  does  not  span  slab  [B,T]  *) 


5 

then  Z  •  s 

6 

if  (Yx  <  B)  and  (T  <  Y2)  or  (Q  *  0) 

7 

then  begin  (*  either  s  spans  slab  [B.Tl 

or  is  the  last 

element  in  Q  *) 

8 

if  Z  i  0 

9 

then  begin  (*  the  rectangle  is  non-empty  *) 

10 

M  -  OUTPOINT  (B,T;Z) ; 

11 

begin  while  (Z  +  0)  do 

12 

*  -  (X;Y1,Y2)  •  Z 

13 

if  (Yx  <  M)  then  *  z 

14 

if  (Y2  >  M)  then  Q2  «  z 

15 

end  (*  while  *) 

16 

V  ■  new  horizontal  node  of  K 

17 

LEFTSON(V)  -  root  (SEARCHTREE 

(B,M;Q1)) J 

18 

RIGHTS ON (V)  -  root  (SEARCHTREE 

(M,T;Q2)); 

19 

U  •  V 

20 

end  (*  if  Z  *  0  *) 

21 

end 

22 

if  (Yt  <  B)  and  (T  S  Y2> 

23 

then  U  «  s  (*  s  spans  slab  [B,T]  *) 

24 

end 

(*  while  Q  +  0  *) 

25 

V  - 

BALANCE  (U) 

26 

return  V. 

27 

end  . 

(*  SEARCHTREE  *) 

each  rectangle  with  a  kind  of  "unmerge"  procedure,  and  Its  median  is  found. 


When  the  rectangle  is  cut,  the  corresponding  list  is  cut  in  two  sublists 
and  P^  for  the  lower  and  upper  slabs,  respectively,  by  simply  eliminating 
the  median  point.  For  instance,  in  the  example  given  in  Figure  2.2,  the 
list  P  of  points  will  be 

P:  4,  6,  7,  11,  13,  15,  16  . 

When  scanning  Q  three  rectangles  are  formed,  the  first  including  segments 
Yj,if  the  second  including  segments  Y2i>  ^22'  ^23'  an<*  third  including 
segment  Y31>  Correspondingly  the  endpoints  in  P  are  marked  as  follows 
(markers  are  in  brackets): 

P:  4[2] ,  6(1],  7(3],  11(2],  13(1],  15(2],  16(3]  . 

Now  a  scan  of  P  will  provide  the  lists  of  sorted  points  for  each  of  the 
rectangles 


Px:  6,  13;  P2:  4,  11,  15;  P3:  7,  16  . 

Considering  now  P2,  its  median  *  11  is  easily  obtained  as  the  middle  point 
of  the  queue.  Splitting  P2  by  eliminating  Mj  yields  the  queues  P^:  4,  and 
P22:  15. 

To  summarize  the  ideas  introduced  so  far,  we  have  shown  in  Figure  2.3 
a  set  of  segments  with  the  corresponding  adjacency  regions.  In  Figure  2.4 
the  partition  into  rectangles  as  Induced  by  the  algorithm  is  shown.  Note  that 
each  region  of  this  partition  is  contained  in  an  adjacency  region.  Finally  in 
Figure  2.5  we  give  the  search  tree  built  by  the  algorithm  for  the  same  set  of 
segments  as  in  Figure  2.3.  The  search  tree  is  built  according  to  the 
BALANCE  procedure  we  are  going  to  describe  in  the  next  section. 


Figure  2.3.  A  set  of  8  vertical  segments  and  the  corresponding  25  adjacency 
regions,  which  are  labeled  with  capital  letters  from  A  to  Y. 

The  segments  are  numbered  according  to  their  increasing 
abscissa. 
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Ml  6a 
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Figure  2.4.  Partition  o£  the  plane  created  by  the  algorithm.  Each  region 
corresponds  to  a  leaf  in  the  search  tree  and  is  contained  in 
some  adjacency  region;  this  correspondence  is  reflected  by  the 
notation.  When  a  segment  is  cut  in  several  parts  each  part  is 
labeled  with  the  abscissa  of  the  original  segment  plus  a  lower 
case  letter,  serving  illustration  purposes  only. 


3.  SEARCH  TREE  BALANCING  PROCEDURE  (BALANCE) 

As  we  have  seen  in  Che  lasc  section,  one  call  of  Che  procedure  SEARCHTREE 
builds  a  list  of  O-nodes  and  7-nodea  Co  be  structured  into  a  binary  tree.  In 
building  such  tree  Che  objective  is  to  bound  its  depth.  The  procedure 
BALANCE  is  designed  to  achieve  an  O(logn)  depth.  The  input  of  BALANCE  is 
the  queue 


U  -  8oi’"**sOn0*Vl . Vr’arl . 8rn. 


where  the  s'  are  O-nodes  and  the  V's  are  7-nodes,  which  are  roots  of  partial 
search  trees  corresponding  to  nonempty  rectangles  of  the  current  slab. 

The  balancing  is  based  on  the  number  p^  of  endpoints  contained  in  the 
i-th  rectangle;  p^  is  also  the  number  of  7-nodes  in  the  subtree  with  root  . 

In  fact  at  each  recursive  call  a  median  is  selected  for  each  nonempty 
rectangle  of  the  slab,  and  correspondingly  a  7-node  is  allocated.  Each 
endpoint  is  the  median  of  some  rectangle  and  therefore  associated  with  a 
7-node;  on  the  other  hand  when  the  point  is  selected  as  a  median,  it  is  no 
further  considered  by  subsequent  calls  of  SEARCHTREE,  so  that  there  is  one 
7-node  for  each  endpoint. 

Procedure  BALANCE  works  as  follows  on  input  U: 

(1)  if  U  consists  of  O-nodes  only,  then  they  are  arranged  in  a  balanced 
binary  tree; 

(2)  if  U  contains  7-nodes  let  K  ^  p^+  ...  +pr  and  let  j  be  defined  by  the 


equations 


p^  +  ...+Pj_^^  K/2, 


(3.1a) 


+  P4  >  K/2, 


(3.1b) 


which  Imply 


^  p,  +  ...  +1 


p^  ■  pj^  +  . . .  +  <  K/2 

■J+l" 


(: 


+...+Pj<K/2  . 


<: 


Calling  s,  and  s  the  left  and  the  right  spanning  segments  bounding  the 
L  R 


J-th  rectangle,  we  decompose  U  as 


U  -  U 


1  8L  VJ  8R  U2 


c 


"U.  *  BALANCE  (U)  is  recursively  defined  in  terms  of  *  BALANCE  (U^)  and 
■  BALANCE (U2)  as  shown  in  Figure  3.1  (the  cases  pL  *  pR  and  pL  <  pR 
are  distinguished  only  to  improve  the  average  depth,  while  for  the  worst 
case  either  of  the  structures  (a)  and  (b)  would  work).  The  subtrees  of 
node  Vj,  and  are  the  search  trees  corresponding  to  the  slabs  in 
which  the  J-th  rectangle  is  sliced.  They  will  be  considered  later,  in 
the  analysis  of  the  tree. 


Figure  3.1.  Recursive  definition  of  BALANCE. 


In  order  to  implement  the  balancing  efficiently,  a  vector  can  be  used 
k 

to  store  the  numbers  £  p, ,  k  ■  1, . . .,r.  While  this  technique  requires 

i-1  1 

0(r)  time  at  the  beginning,  it  allows  us  to  find  the  node  Vj  in  time 
logarithmic  in  the  number  of  7-nodes  involved  in  each  recursive  call  of 
BALANCE.  If  there  are  only  O-nodes,  it  is  also  clear  that  they  can  be 
balanced  in  time  linear  in  their  number.  In  conclusion  the  BALANCE  runs  in 


time  linear  in  the  total  number  of  nodes  of  the  input  queue  U. 


4.  WORST  CASE  PERFORMANCE  ANALYSIS  AND  STORAGE  LOWER  BOUNDS 
In  this  section  we  analyze  the  worst  case  asymptotic  performance  of  the 
method,  considering  the  time  to  build  the  search  tree  (preprocessing  time), 
the  number  of  nodes  (storage)  and  the  depth  of  the  tree  (search  time)  and 
showing  that  they  are  O(nlogn),  O(nlogn)  and  O(logn)  respectively.  In  the 
last  part  of  the  section  we  introduce  lower  bound  considerations  for 
storage,  which  will  be  used  in  the  next  section  for  an  appraisal  of  the 
results  of  the  probabilistic  analysis  of  the  number  of  nodes. 

We  begin  by  proving  a  lemma  showing  that  the  number  of  segments  and 
and  the  number  of  points,  either  in  a  rectangle  or  in  a  slab,  are  of  the 
same  order. 


Lemma  4.1.  Let  eR  and  pR  be  respectively  the  numbers  of  segments  and 
points  in  a  rectangle,  and  e  and  p_  the  same  quantities  in  a  slab 


generated  by  the  algorithm.  We  have  pR  »  0(eR),  and  pg  »  0(eg). 


Proof.  In  a  rectangle  there  are  no  spanning  segments  and  therefore  each 


segment  has  at  least  one  endpoint  inside  the  rectangle;  on  the  other  hand 


a  segment  has  at  most  two  endpoints,  hence  eR  £s  pR  <  2  eR,  which  proves 
pR  =*  0(eR).  Let  us  now  consider  a  slab  obtained  by  horizontally  cutting 
a  rectangle  at  the  median  of  the  segment  endpoints.  If  pR  is  odd  both  the 
lower  and  upper  slabs  contain  pg  ■  (pR-l)/2  points.  If  pR  is  even  one 
slab  contains  pR/2  and  the  other  pR/2-l  points.  In  any  case 


pR/2-l  ^  pg  S  pR/2  so  that  pg  ■  0(pR)  “  9(eR).  since  a  segment  in  the 

rectangle  may  generate  at  most  a  segment  in  a  slab,  e_  ^  e_  £  p_. 

S  R  R 

Moreover,  the  number  of  segments  in  the  slab  is  at  least  one  half  of  the 

number  of  points,  hence  e  *  p  /2  s  (p  /2-l)/2.  Therefore  (p„/4  -  %)  <  e  <p 

o  5  R  K  o  1 

-  0(e_),  hence  p.  -  0(e_). 


and  ec  -  0(p_).  But  we  have  seen  that  p„ 


C 


Lenina  4.1  Is  useful  because  it  shows  that,  in  the  asysptotical 
analysis,  the  work  done  by  the  algorithm  in  a  slab  or  in  a  rectangle  can  be 
changed  indifferently  to  the  points  or  the  segments.  We  now  analyze 
separately  the  three  performance  measures. 

Preprocessing  time.  We  note  that  each  point  belongs  at  most  to  O(logn) 
slabs.  In  fact  each  call  of  the  procedure  SEARCHTREE  processes  no  more  than 
half  of  the  number  of  points  processed  by  the  calling  procedure.  Also  we 
note  that  at  each  call  the  work  done  to  prepare  subsequent  calls  is  linear 
in  the  number  of  points  (or  segments)  processed.  In  fact,  a  constant  work 
is  required  for  each  segment  to  decide  if  it  is  spanning  or  not  and,  when 
it  is  not  spanning,  to  insert  it  in  the  queues  for  the  lower  and  upper  slabs 
of  the  rectangle,  if  appropriate.  Moreover  the  median  of  the  points  in  each 
rectangle  is  found  in  time  linear  in  the  number  of  points  (within  the  rectangle) 
both  by  using  a  median  finding  algorithm  or  the  method  proposed  in  Section  2. 

The  BALANCE  procedure  also  takes  time  linear  in  the  total  number  of  tree- 
nodes  that  it  processes,  and  hence  in  the  number  of  points  of  the 
corresponding  slab.  In  fact  the  number  of  spanning  segments  is  O(e^)  *0(pg) 
and  the  number  r  of  rectangles  is  certainly  0(pg). 

In  conclusion  for  each  call  0(1)  work  is  done  for  each  point  processed 
by  the  call,  resulting  in  O(nlogn)  total  time  for  the  procedure  SEARCHTREE, 
and  therefore  for  the  entire  algorithm  including  presorting  of  segments  and 
endpoints . 

S torage .  The  storage  is  proportional  to  the  number  of  nodes  in  the  search 
tree.  The  number  of  7-nodes  is  2n.  The  number  of  0 -nodes  for  each  segment 
is  one  plus  the  number  of  times  the  segment  is  cut  by  the  median  of  the 
same  rectangle.  The  total  number  of  cuts  can  be  easily  bounded  considering 
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that  In  a  rectangle  o£  w  points  at  most  w-1  segments  are  cut  by  the 
median,  and  that  In  a  rectangle  with  1  and  2  points  the  maximum  number 
of  cuts  Is  0  and  1  respectively.  So  the  function  f(w)  recursively  defined 


f<l)  -  0,  f (2)  -  1  ,  (4.1a) 

f(2w)  -  f(w)  +  f(w-l)  +  2w-l  ,  (4.1b) 

f(2w+l)  ■  wf(w)  +  2w  ,  (4.1c) 


is  a  bound  for  the  number  of  cuts  when  processing  s  points.  Since 
£(w)  ■  0(w  log  w),  the  total  number  of  nodes  in  the  tree  is  O(nlogn). 

Search  time.  This  part  of  the  analysis  is  somewhat  delicate  because  the  depth 
6(10  of  Che  tree  V.  -  BALANCE (U)  depends  on  the  level  of  recursion  of 
SEARCHTREE  in  which  the  queue  U  has  been  formed  (through  the  number  of  O-nodes 
and  the  weight  of  V-nodes),  and  also  on  the  way  in  which  the  nodes  are  arranged 
in  the  tree.  In  fact,  suppose  a  7-node  V  is  placed  at  distance  from 

the  root  of  u  and  that  the  subtree  rooted  in  V  has  depth  d^,  then  5(10  ^  di+d2 
The  same  is  true  when  a  subtree  of  0-nodes  of  depth  d^  is  formed  and  its 
root  is  placed  at  distance  d^  from  the  root  of  1(.  These  considerations  lead 
us  to  the  following  definition  and  remarks. 

1)  The  weight  of  a  slab  is  defined  as  K  ■  p^  + . . . + pr,  and  is  the  total 
number  of  endpoints  contained  in  the  slab. 


^  ^According  to  standard  terminology,  this  distance  is  the  number  of  arcs 
in  the  path  from  V  to  the  root. 


2)  The  number  H  of  spanning  segments  in  a  slab  of  weight  K  is  at  most 
K  +  2.  To  prove  this  claim,  we  consider  the  rectangle  whose  split 
generates  the  slab .  A  segment  spanning  the  slab  must  have  one 
endpoint  in  the  rectangle;  this  endpoint  either  is  the  cutpoint  or  lies 
in  the  companion  slab  originating  from  the  same  rectangle,  which  has  at 
most  K  +  1  points. 

3)  The  level  of  a  recursive  call  of  BALANCE  is  defined  as  follows: 

BALANCE (U)  has  level  0;  the  calls  made  by  a  call  at  level  i  have 
level  i  +  1. 

We  are  now  ready  to  state  the  following  lenma. 

Lemma  4.2.  The  tree  V.  constructed  by  the  procedure  BALANCE  for  a  slab  of 
weight  K  has  a  depth  5 (K)  ^  log  K  +  4. 

Proof.  (By  induction  on  K). 

Basis.  For  K  ■  1,  a  slab  may  have  at  most  one  nonspanning  and  two  spanning 
segments  and  therefore  1(  has  at  most  three  0-nodes  and  one  7-node,  so  that 
6(10  ^  4. 

Inductive  step.  We  assume  now  that,  for  K'  <  K,  6  (1<)  ^  3  log  K'  +  4. 

(i)  v-nodes.  From  the  definition  of  BALANCE  and  Eqs.  (3.2a)  and  (3.2b)  it 
is  easy  to  see  that  at  each  call  the  weight  of  the  input  is  at  least  halved, 
so  that  the  input  of  a  call  at  level  i  has  weight  at  most  K/2*.  Also,  if 
there  are  7-nodes,  the  1-level  call  allocates  one  of  them,  say  V,  at  a 
distance  no  more  than  2(1+1)  from  the  root  of  ty.  The  subtrees  of  V  are 
balanced  trees  of  weight  no  more  than  K/2^+^,  and  by  the  Inductive  hypothesis 
they  have  a  depth  3  log(K/2^+^)  +4*3  log  K  -  3(i+l)  +  4;  therefore  the 
distance  of  the  leaves  of  such  subtrees  from  the  root  of  *U is 
<  3  log  K  -  3(i+l)  +  4  +  2(1+1)  +  1  <,  3  log  K  +  4. 


(ii)  0 -nodes.  I£  the  input  of  a  call  at  level  i+1  has  no  7-nodes,  but 
the  calling  procedure  at  level  i  has  some,  we  argue  as  follows.  Since  the 
input  of  the  calls  at  level  i  has  weight  less  than  K/21,  i  cannot  be  larger 
than  Tlog  Kl  .  Therefore  the  root  of  the  tree  of  O-nodes  built  by  the  call 
at  level  (i+1)  has  a  distance  from  the  root  of  U  which  is  at  most 
2 ( T log  Kl  +  1)  and  has  a  depth  at  most  logf*K+2l  since  there  are  less  than 
K  +  2  nodes  in  the  input.  In  conclusion  the  distance  of  the  leaves  of  the 
O-node  tree  from  the  root  of  "U  is  at  most  2(f*log  Kl  +  1)  +  logPk+2l  » 

S  3  log  K  +  4.  This  completes  the  proof  of  the  lemma. 

Considering  that  the  search  tree  is  constructed  by  a  call  of  BALANCE 
on  an  input  of  weight  K  ■  2n,  we  prove  the  following. 

Theorem  4.1.  The  entire  search  tree  has  a  depth  bounded  by  3  logn  +  7. 
Lower  bounds.  We  have  already  said  in  the  introduction  that  there  are 
point- location  algorithms  linear  in  the  storage  and  we  will  show  in  the 
next  section  that  our  algorithm  uses  expected  linear  storage.  It  is  also 
trivial  to  see  that  linear  storage  is  asymptotically  optimal,  i.e.,  within 
a  multiplicative  constant  of  the  minimum.  Now  we  would  like  to  know  some¬ 
thing  more  about  such  a  constant.  We  obtain  the  following  simple,  but 
interesting  result:  the  number  of  nodes  in  the  search  tree  for  a  set  of  n 
segments  with  distinct  endpoint  ordinates  is  at  least  3n.  Notice  that  this 
is  a  lower  bound  not  only  for  the  worst  case,  but  for  all  the  instances  of 
the  problem,  and  therefore  applies  also  to  average  results.  We  give  here 
two  segments  to  establish  the  stated  result. 


The  first  argument  is  almost  trivial.  We  observe  that  each  segment  Is 
specified  by  2  endpoint  ordinates  and  one  abscissa;  therefore  It  will 
generate  at  least  2  7-nodes  and  one  O-node  In  any  search  tree  able  to  solve 
the  adjacency  problem.  In  fact  by  changing  only  one  of  these  parameters 
we  change  at  least  some  adjacency  region  and  therefore  the  parameter  oust 
appear  in  the  tree  to  account  for  this  change. 

Another  argument  stems  from  different  considerations.  The  search  tree 
is  a  binary  tree  and  therefore  the  number  of  different  search-paths 
(including  the  exit  from  the  last  node  traversed  which  can  be  left  or 
right)  equals  the  number  of  nodes  in  the  tree  plus  one.  Each  path  corresponds 
to  a  region  (see  Figure  2.5  as  an  example)  of  the  partition  reflected  in  the 
adjacency  map.  We  can  conclude  that  the  number  S  of  nodes  In  the  tree,  and 
the  number  A  of  adjacency  regions,  must  satisfy  S  ^  A-l.  In  order  to 
complete  our  reasoning  we  need  to  estimate  A.  If  all  the  endpoint  ordinates 
are  distinct,  there  are  A  -  3n  +  1  adjacency  regions  [1],  and  therefore 
S  *  3n. 

We  will  reconsider  the  3n  lower  bound  for  S  in  the  next  section,  when 
analyzing  the  average  behavior  of  our  algorithm. 


5.  PROBABILISTIC  ANALYSIS 


In  this  section  we  derive  some  results  on  the  average  performance  of 
the  algorithm,  the  main  purpose  being  to  show  that  the  expected  storage  is 
linear  in  the  number  of  segments.  We  also  show  that  the  expected  time  for 
the  procedure  SE ARCHTREE  is  linear. 

5.1.  Random  Model 

To  obtain  average-case  performance  results  we  need  a  probabilistic 
model  for  the  input  of  our  algorithm,  l.e.  for  the  set  Cl  *  [s^,...,sq}  of 
segments.  We  have  to  consider  that  while  we  are  dealing  with  segments  whose 
endpoints  are  real  numbers,  the  only  feature  of  the  input  which  is  relevant 
to  the  algorithm  is  the  relative  order  of  the  endpoints  of  the  input  segments. 
In  other  words,  all  the  Inputs  that  result  in  the  same  set  of  normalized 
segments  are  equivalent.  Therefore  the  number  of  possible  inputs,  for  a  given 
input  size  n,  is  essentially  finite. 

In  principle  the  input  Cl  is  probabilistically  described  by  the  joint 
distribution  of  the  endpoints.  From  this  distribution  the  probability  of 
each  set  of  normalized  segments  can  be  computed  and,  for  each  normalized 
input,  the  size  of  the  search  tree  built  by  the  algorithm  can  be  calculated. 
The  expected  size  of  the  tree  could  be  obtained  by  averaging  the  tree  size 
according  to  the  computed  distribution  of  normalized  inputs. 

In  practice  the  approach  outlined  above  would  result  in  a  very  heavy 
combinatorial  problem  and  can  hardly  be  used.  To  overcome  this  difficulty 
a  suitable  model  of  the  set  Cl  will  be  Introduced  that,  while  simplifying 
the  analysis,  will  still  preserve  the  main  features  of  the  original  problem. 


The  difficulties  in  analyzing  the  average  behavior  of  the  algorithm. 


when  operating  on  a  finite  input,  arise  mainly  from  two  facts:  (1)  the 
cutpoints  are  random  variables;  (2)  there  are  some  "boundary  effects"  that 
make  some  statistical  properties,  e.g.  the  average  number  of  cuts  affecting 
a  given  segment  s^,  dependent  upon  j,  or,  in  other  words,  not  stationery. 

On  the  other  hand  the  median  of  a  very  large  number  of  random  variables  has 
generally  very  small  variance  and,  for  a  reasonably  large  number  of  segments, 
the  "boundary  effects"  should  be  all  but  negligible. 

The  foregoing  considerations  suggest  that  the  analysis  would  be  greatly 
simplified  by  considering  the  case  of  an  infinite  number  of  segments,  and 
therefore  modeling  the  input  by  e  pair-valued  random  process 

*J  “  <W  *  Bj  <  Tj  •  *  6  Z  »  (5.1) 

where  B j  and  Tj  are  the  bottom  and  top  ordinates  of  segment  ■ j ,  Z  is  the  set 
of  the  integers,  and  the  abscissa  Xj  of  s^  is  increasing  with  j.  Notice  that 
the  actual  value  of  Xj  is  irrelevant  to  the  algorithm  as  long  as  the  order  of 
the  segments  does  not  change. 

In  the  following  we  derive  results  using  model  (5.1)  for  the  input  of  our 
algorithm,  with  further  assumptions  on  the  process  (s j »  j  €  Z) .  The  entire 
analysis  will  be  carried  out  under: 

Assumption  Al:  the  process  Sj  is  e  sequence  of  mutually  Independent  and 
identically  distributed  random  pairs. 

According  to  Assumption  Al  the  process  Sj  will  be  probabilistically  specified 
in  a  complete  manner  by  the  segsmnt-endpolnt  joint  distribution 

FK(b,t)  A  plB25  b,T25  t] 


(5.2) 
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which  is,  by  hypothesis,  independent  of  j.  Here  and  in  the  sequel,  we  follow 
the  convention  of  denoting  a  random  variable  by  a  capital  letter  and  the 

values  it  assumes  by  the  corresponding  lower  case  letter. 

» 

Generators .  While  the  bottom  B  and  the  top  T  are  a  natural  description  of 
a  vertical  segment,  for  our  purposes  it  is  convenient  to  consider  the 
segment  endpoints  U  and  V  to  be  an  unordered  pair  from  which  B  and  T  can 
be  recovered  as 


B  -  min{u,v) ,  (5.3a) 

T  *  max£u,v)  .  (5.3b) 

We  call  U  and  V  generators  of  the  segment  and  we  describe  them  probabilistically 
by  the  joint  distribution 

F^u.v)  £  P[U  <  u,  V  SS  vl  .  (5.4) 

There  are  several  advantages  in  working  with  generators.  One  is  that 
generators  can  be  assumed  symmetrically  distributed,  i.e.  F^^.v)  *  F^CVjU) , 
and  therefore  identically  distributed,  i.e.  F^u)  *  Fy(u).  Here 
Fjj(u)  ^  P[U  £  u]  and  F^(v)  ■  P[V  <  v]  are  the  marginal  distribution  functions 
of  U  and  V.  Moreover,  for  the  generators  we  may  assume  independence  together 
with  Identical  distribution,  while  this  is  not  possible  for  B  and  T,  since 
B  £  T. 

It  is  also  important  to  notice  that  there  is  no  loss  of  generality  in 
considering  generators.  In  fact,  given  B  and  T  we  can  always  construct  some 
generators  U  and  V  with  the  desired  properties  of  symmetry  by  letting 


U»QB+(1-Q)T,  (5.5*) 

V-(1-Q)B+QT,  (5.5b) 

when  Q  €  {0,l}  is  a  random  variable,  independent  of  B  and  T,  with 
P[Q  »  0]  ■  P[Q  *  1]  ■  %.  It  is  easy  to  show  that  for  such  U  and  V  the 
joint  distribution  is 

F^u.v)  -  %(Fbt(u,v)  +  Fbt(v,u))  ,  (5.6) 

and  therefore  F^u.v)  -  Fw(v,u). 

A  robustness  result.  A  first  advantage  of  the  generator  symmetry  stems  from 
a  fact  we  have  already  stated  in  different  form:  the  adjacency  map  is 
invariant  under  any  continuous  one-to-one  transformation  of  the  plane  along 
the  direction  of  the  segments.  Therefore  we  can  study  the  problem  by  replacing 
the  original  ordinate  y  with  a  new  ordinate  z  defined  as 

*  ■  Fu(y)  .  (5.7) 

If  Fy  is  continuous  (i.e.  P[U  ■  y]  ■  0  for  all  y's),  then  the  new  generators 

U*  “  Fu<U)  Vz  "  FvCV)  *  (5'8) 

are  uniform  in  the  interval  [0,1].  This  is  easily  seen  when  is  strictly 

Increasing  and  therefore  invertible.  In  fact  U  €  [0,1],  (see  Figure  5.1), 

z 

and  if  u  6  [0,1]  then 

Fc  (u)  A  p[Uz  <;  u]  -  PtFyOJ)  £  u] 
z 

-  Plus  f'V)]  a  F^f’V))  -  u  , 


c 
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which  is  the  uniform  distribution.  If  Fy  is  constant  on  some  interval,  the 
inverse  F~^  is  not  properly  defined,  but  the  proof  will  still  go  thru  with 
F’1  replaced  by  Fy^u)  ^  sup{x|Fy(x)  ^  u} . 


i 

o 


* 


Figure  5.1.  A  random  variable  U  transformed  according  to  its  own 
distribution  Fy  becomes  uniform  in  [0,1]. 

We  conclude  that  our  algorithm  presents  a  robustness  property,  its 
complexity  being  independent  of  the  first  order  distribution  of  the  generators. 

Without  loss  of  generality  with  respect  to  Assumption  A1  we  will 
develop  the  analysis  under 

Assumption  Bl:  the  process  s^  ■  ls  a  sequence  of  mutually  Independent 

and  identically  distributed  random  pairs  such  that: 

(i)  the  joint  distribution  F^CujV)  -  F^Cv.u)  is  symmetric; 

(ii)  the  marginal  distribution  Fy(u)  and  Fv(v)  are  uniform  in  [0,1], 


i 


I 


5.2.  The  Goals  of  the  Analysis;  Space  and  Time 

The  £irst  quantity  we  want  to  analyze  is  the  number  S  of  nodes  in  the 
search  tree.  For  the  case  of  n  segments,  if  the  algorithm  cuts  segment  s^ 
by  means  of  c j  medians,  we  have 


n 

S  -  3n  +  E  c.  .  (5.9) 

J-l  3 

In  fact  there  are  2n  V-nodes,  one  for  each  endpoint,  and  (c  +  1)  O-nodes  for 
a  segment  cut  c  times.  Denoting  by  E  the  expectation  operator  on  random 
variables,  our  goal  is  to  compute  E[c^]  to  get 

n 

E[S]  «  3n  +  E  E[c  ]  .  (5.10) 

j-l  3 

In  our  model  where  n  is  *,  Eqs.  (5.9)  and  (5.10)  have  no  sense,  but  we  can 
still  compute  E[Cj],  If  this  quantity  turns  out  to  be  finite  we  can  say 
that  our  algorithm  has  asymptotically  linear  storage,  and  we  can  estimate  the 
storage,  at  least  for  large  values  of  n,  by  the  formula 


E[S]  -  (3  +  E[c])n 


(5.11) 


Here  and  in  the  following  we  drop  the  subscript  j  in  segment  s ^  and  related 
quantities  such  as  ,  since  their  statistical  properties  are  independent 
of  j  due  to  the  stationarity  of  the  input  process. 

As  far  as  the  time  analysis  is  concerned,  we  have  seen  that  the 
worst-case  performance  is  O(nlogn)  both  for  presorting  and  SEARCHTREE. 
However  the  work  per  endpoint  in  each  recursive  call  of  SEARCHTREE  that 
processes  that  point  is  bounded  by  a  constant.  Therefore  if  we  can  prove 
that  the  average  number  of  calls  in  which  a  point  is  processed  is  constant 
we  can  conclude  that  SEARCHTREE  runs  in  expected  linear  time. 


5.3.  Principal  Slabs  and  Principal  Media  as 

In  order  to  analyze  the  algorithm  we  take  a  look  at  the  way  it  works 
and  introduce  a  suitable  terminology  to  describe  it.  A  slab  whose  y-interval 
is  of  the  kind  [0,y]  or  [y,l]  is  called  external.  The  probability  that  there 
is  a  spanning  segment  in  an  external  slab  is  zero.  Hence  the  algorithm  will 
process  an  external  slab  by  cutting  it  at  the  median,  which,  with  probability  1, 
is  the  arithmetic  mean  of  bottom  and  top  of  the  slab.  For  example  the  first 
cut  will  be  at  y  -  %,  and  will  generate  the  slabs  [0,%]  and  [%,1];  slab 
[0,%]  will  be  cut  at  y  *  \  and  so  on.  Cuts  of  external  slabs  will  generate 
the  following  family  of  medians  we  call  principal  medians: 

n  -  UJk  €  Z}  (5.12) 

where  “k  ^  for  k  <  0*  ^  1  “  2’^k+1^  for  k  >  0.  Correspondingly 

the  following  principal  slabs  are  formed: 

’  slaMm^m^]  ,  k  <  1, 

slab(k)  -  (5.13) 

>  slab[mk_1,mk]  ,  k  *  1  . 

The  principal  slabs  are  shown  in  Figure  5.2. 

It  is  convenient  to  analyze  the  procedure  SEARCH TREE  separating  the  stage 
in  which  the  principal  medians  are  formed  from  the  subsequent  processing  of  the 
interiors  of  the  principal  slabs.  One  reason  for  this  approach  is  that 
(depending  on  the  distribution  F^)  the  principal  slabs  will  generally  be 
partitioned  into  rectangles  by  spanning  segments.  Medians  in  rectangles  have 
only  a  local  effect  and  need  to  be  analyzed  differently. 


3 lab (2) 


slab(l) 


slab(-l) 


a lab (-2) 


Figure  5.2.  Principal  slabs  and  principal  medians. 


According  to  this  approach,  let  us  consider  a  segment  s  *  (U,V) .  The 
general  situation  is  shown  in  Figure  5.3,  where  slab  (or)  and  slab  (0)  are 
the  principal  slabs  containing  U  and  V  respectively.  We  can  express  the 
number  c  of  cuts  performed  by  the  algorithm  on  segment  s  as 


c  -  cQ  +  c(U)  +  c(V),  (5. 

where  Cq  is  the  number  of  cuts  due  to  the  principal  medians,  c(U)  and  c(V) 
are  the  numbers  of  cuts  occurring  in  the  subsequent  processing  of  slab  (a) 
and  slab  (0),  respectively.  When  slab  (or)  «  slab  03),  or  in  other  words 
U  and  V  are  in  the  same  principal  slab,  clearly  crt  -  0,  but  we  still  have 


Proof .  The  claim  follows  by  considering  that,  if  on  one  side  of  median  m 


there  are  q  points,  median  m  cuts  at  most  q  segments.  Therefore  mQ  cuts 
at  most  n  segments,  m_^  and  m^  cut  at  most  n/2  segments  each,  and  so  on. 
Of  course  only  a  finite  set of  principal  medians  is  to  be  considered. 
Thus 

n  n 

Z  c-,  -  Z  Z  crt,(m)  ^n  +  2(%n  +  fcn+...)-3n,  (5 

j-1  0J  j-1  m  €  ffj*  0J 

where  c^j(m)  *  1  if  m  cuts  segment  s^,  and  Cgj(m)  »  0  otherwise. 

From  (5.15)  we  get 

£  S  E[cfl  1  —  3  ,  (5 

a  J-1  ° 

which  is  the  stationary  model,  where  n  —  ®  and  E[c0jl  “*  E[cQ],  becomes 

E[cq]  <  3  .  (5 

To  compute  E[Cq]  exactly  we  write  Cq  in  terms  of  cQ(m)  as 

cn  *  s  cn(m)  ♦  (5 

U  m  €  ft  U 

Since  CgCm)  €  [0,l],  we  have 

E[c0(m)]  -  P[cQ(m)  -  1]  -  2(Fuv(m,l)  -  F^On.m))  .  (5 

The  last  expression  for  P[Cq(iii)  -  1]  is  obtained  by  observing  that  (see 
Figure  5.4) 

P[cQ(m)  -  1]  -  P[U  <  m,V  >  m]  +  P[U  >  m,V  <  m]  , 

and 

PtU  <  m,V  >  m]  -PtU  <  m]  -P[U  <  m,V  £  m]  -F^On.l)  -F^^.m). 
Hence,  by  the  symmetry  of  generators,  Eq.  (5.19)  follows. 
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Incidentally,  we  note  that  bound  (5.21),  derived  here  as  a  consequence 
of  Proposition  1,  can  be  obtained  directly  from  Eq.  (5.20).  For  this  purpose 
in  Eq.  (5.20)  the  range  of  susmation  defined  in  Eq.  (5.12),  can  be  split 
in  sets  {m^(k  <  0}  and  (m^l k  >  0].  The  sum  is  then  upper  bounded  using  the 
fact  that  in  the  first  set  m^  -  ^  ■  2  ,  and,  in  the  second  set, 

-  Fuv(,nit»,l,ic)  —  (1  *  “fc)  “  2~^t+^\  The  result  (5.21)  then  follows  by 
the  sum  of  two  geometric  series. 

5.5.  Analysis  of  Rectangles  in  Principal  Slabs 

We  still  have  to  analyze  the  contribution  of  c(U)  and  c (V)  to  c,  in 
Eq.  (5.14).  For  this  task  we  need  to  consider  the  work  done  by  the  algorithm 
when  processing  the  interiors  of  the  principal  slabs.  In  general  a  given 
generator,  say  U,  will  be  contained  in  a  rectangle  ft  formed  by  the  boundary 
medians  of  a  principal  slab  together  with  two  spanning  segments.  Since  the 
algorithm  processes  the  rectangles  independently  of  each  other,  c(U)  is 
determined  only  by  the  contents  of  ft.  Intuition  suggests  that,  on  the  average, 
c(U)  increases  with  the  total  number  W  of  points  in  ft.  This  we  shall  now 
analyze . 

Analysis  of  segments.  We  begin  by  studying  the  relative  position  of  a 
segment  and  a  generic  slab  [m,m']«  We  define  the  set  of  segments  that 
respectively:  (1)  span  the  slab,  (li)  have  one  endpoint  in  the  slab, 

(ill)  have  two  endpoints  in  the  slab,  or  (iv)  are  outside  the  slab.  More 
formally 


J ^  Cs|b  <  m,  m'  <  t3  , 

3^  ^  [s|b  <  m  <  T  <  m'  or  m  <  B  <  m’  <  t} 


(5.22a) 

(5.22b) 
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$2  ^  {8)m  <  B,  T  <  m'}  ,  (5.22c) 

O  ^  £s j T  <  m  or  B  >  m'}  .  (5.22d) 

It  is  also  useful  to  let 

(5.22e) 

as  the  set  of  segments  which  have  non- trivial  Intersection  with  the  slab 
[m,m' ].  Of  course  all  the  sets  defined  by  Eqs.  (5.22a)-(5.22e)  are  functions 
of  m  and  m* ,  but,  for  simplicity,  we  do  not  show  it  in  the  notation. 

In  Figure  5.5  typical  examples  are  given  for  segments  in  all  classes,  and 
in  Figure  5.6  the  classes  are  shown  in  the  (U,V)  plane. 

The  probability  that  a  segment  s  belongs  to  one  of  the  classes  we  have  just 


defined  can  be  expressed  in  terms  of  F^  as  follows. 

P[s  €  S] -2(Fuv(m,l)  -  FgyOn.m'))  -  2(m  -  F^Cm.m’ ))  ,  (5.23a) 

P[s  €  <*1]  -  2(Fuv(m,,l)  -F^OM))  -2Pts  6  <*2]  «2(m-m,-P[s  6  ,  (5.23b) 

P[s  €  <*2J  -  F^^'.m')  -  2Fuv(m,m')  +  Fuv(ffl,m)  ,  (5.23c) 

P[s  €  O]  ■  Fuv(m,n)  +  F^fn'.m')  -  2Fuv(m',l)  +  1 

-  1  -  2m'  +  FjjyOn'  ,m' )  +  F^On.m)  ,  (5.23d) 

P[s  €  Jf]  -  1  -  P[s  €  &]  -  2m'  -  F^m’ ,m')  -  Fw(m,B)  .  (5.23e) 


When  we  focus  our  attention  on  the  slab  [m,m' ]  we  are  interested  only 
in  segments  belonging  to  J.  So  we  introduce  the  following  conditional 
probabilities,  which  are  well  defined  since  P[s  €  J]  >  0: 
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pQ  A  p[8  €  ^js  €J]  -  P[s  6  a/]/P[s  €  J]  , 

(5.24a) 

Pl  A  p[s  6  ijs  6  -  P[s  €  tfJ/Pts  €  J]  , 

(5.24b) 

p2  A  Pts  6  i2|s  €  J>]  -  P[s  6  i2]/Pts  €  J]  . 

(5.24c) 

To  guarantee  the  existence  (with  probability  1)  of  rectangles  In  the 
principal  slabs,  we  Introduce  the  following  assumption,  that  Is  satisfied 
by  most  reasonable  distributions. 

Assumption  A2:  the  distribution  Is  such  that  probability  Pq  Is  nonzero 
for  all  the  principal  slabs. 

Analysis  of  rectangles.  Let  us  now  consider  a  generator  of  s,  say  U  €  slab(l) 
Let  s^  and  sR  the  left  and  the  right  segments  closest  to  U,  among  those 
which  span  slab(l),  and  call  ft  the  rectangle  closed  by  these  two  segments. 

The  number  W  of  segment  endpoints  In  node  ft  can  be  written  as 

W  -  Nl  +  Nr  +  1  +t(V)  ,  (5.25) 

where  N  Is  the  number  of  points  In  ft  to  the  left  of  U,  N„  Is  the  number  of 
those  to  the  right,  the  third  term,  1,  accounts  for  U  Itself,  and  f(V)  Is  1 
if  V  -  the  other  generator  of  s  -  is  also  In  slab(l)  and  0  otherwise.  An 
example  is  given  in  Figure  5.7,  where  V  4  slab(i)  so  that  $(V)  ■  0, 

NL  -  2,  Nr  -  4,  and  W  -  2+4 +  1  +  0  -  7. 

Now  we  want  to  find  the  probability  mass  function  (p.m.f.)  of  W.  It  is 
useful  to  partition  the  sample  space  Into  regions  where  U  belongs  to  a  given 
slab.  The  theorem  on  total  probability  for  this  partition  yields 
+• 

P[W«w]  -  E  P[W -wju  €  slab(i)]P[U  6  slab(l)]  . 

W-i 


(3.26) 


Figure  5.7.  The  rectangle  A  including  generator  U. 

Since  U  is  uniform  in  [0,1],  P[U  €  slab(i)]  equals  the  width  of  the 

slab,  and  we  only  need  to  compute  P[W  ■  w|u  €  slab(i)].  First  let  us  observe 
that,  under  the  condition  U  €  slab(i),  N^,  NR  and  f  (V)  are  independent  random 
variables.  Also  let  us  denote  for  brevity  P[  (u  €  slab(i)]  as  P^[  ].  Then 
we  have 

P±[W  -w]  -  Pil^  +  HR  ■  w^JP^V  €  slab(i) ]  +  Pj^  +NR  *w-l]P^[V  $  slab(i)] .  (5.27) 
Now  we  have 

PttV  €  slab(i)]  -  P[V  €  slab(i) |u  €  slab(i)] 

-  P[U,V  €  slab(i) ]/P[U  6  slab(i)] 

-  P[s  6  <f1]2lll+1  ;  (5.28) 


Pt[V  $  slab(i)]  -  1-P^V  6  slab(i)] 


(5.29) 


It  remains  to  evaluate  P^l^  +  * w] .  Since  and  NR  are  independent 

of  each  other,  the  p.m.f.  of  the  sum  is  the  discrete  convolution  of  the 
p.m.f' s  of  the  addends  and  J^.  So  let  us  compute  P^ll^-w],  which  also 
equals  P^[N^sw]  by  the  synnetry  of  the  configuration.  Referring  to 
Figure  5.7,  we  can  define  and  L2  as  the  numbers  of  segments  in  classes 
8^  and  8 2  °f  slab(i)  that  are  between  U  and  aR.  The  numbers  of  endpoints 
at  the  right  of  U  is  obviously  ■  L2  +  2L2,  and  therefore 


Lw/2J 

PilNR“w]  •  E  Pi[Ll*w*2h’L2"hl  . 


(5.30 


Since  there  are  ways  to  form  a  string  of  w  -  2h  segments  of  8^  and 

h  segments  of  $2,  followed  by  a  segment  of  «/, 


plLl  -  w-2h,L2  -h]  ■  PQ  ^  h  )  P1 


w-hv  (w-2h)  h 

h  I  P1  P2 


(5.31 


Thus,  by  Eqs.  (5.30)  and  (5.31)  can  be  expressed  in  terms  of  the 

probabilities  pg,  p^,  p2  of  slab(i).  Finally 


Pi[NL+NR-w]  -  E  P1[NL-h]Pi[NR-w-h] 
h“0 


(5.32 


is  the  announced  convolution  for  the  p.m.f.  of  +  NR. 

Summary.  Equations  (5.23)  and  (5.24)  give  the  probabilities  pQ,  p^,  p2>  for 
a  given  slab,  in  terms  of  F yy.  Equations  (5.30)  (5.31)  and  (5.32)  give  the 
p.m.f.  of  +  Nr.  Equations  (5.28),  (5.29),  (5.30)  substituted  in  (5.27) 
yield  the  conditional  p.m.f.  P[W*wju  €  slab(i)].  Finally  Eq.  (5.26)  is  the 
desired  p.m.T.  of  W. 


On  the  number  of  segments  in  a  rectangle.  While  we  will  use  the  probabilistic 
characterization  of  W,  the  number  of  points  in  ft,  for  a  close  estimate  of 
E[c(U)],  the  number  of  segments  in  ft  turns  out  to  be  more  manageable  when 


we  are  only  concerned  with  asymptotic  performance.  On  the  other  hand. 

Lemma  4.1  allows  us  to  interchange  points  and  segments  when  we  can  neglect 
multiplicative  constants. 

Given  U  €  slab(i),  we  consider  the  number  Q  of  segments  that  are  in 
class  J>  for  slab(i)  and  that  are  between  and  U,  or  U  and  sR,  (see 
Figure  5.7),  the  segment  s  generated  by  U  and  V  is  not  counted,  for  simplicity. 
If  we  define  and  QR  to  be  the  numbers  of  segments  in  ft  respectively  to  the 
left  and  to  the  right  of  s,  we  can  write 

Q  “  ql  +  Qr  •  (5.33) 

In  Figure  5.7  QL  ■  2  and  QR  »  3.  It  is  easy  to  see  that  QR  is  a  geometrically 
distributed  random  variable,  i.e. 

-  ql  -  p0(l-p0)q,  q  -  0,1,...  ,  (5.34) 

since  QR  -  q  if  and  only  if  s  is  followed  by  q  nonspanning  segments  and 
then  by  one  which  is  spanning.  is  independent  of,  and  distributed  as 
Qr.  Therefore  the  p.m.f.  of  Q  is 

q 

p^Q  -  q]  -  I  PitQL  ■  hlpj.tQR  ■  q-h] 

h“0 

-  Po(q  +  i)d-p0)q  . 


(5.35) 
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5.6.  Bounds  for  E[c(U)l 

The  exact  calculation  of  E[c(U)l  would  require  us  to  solve  quite 
difficult  combinatorial  problems,  so  that  we  limit  ourselves  to  obtain 
bounds  on  either  side.  First  we  give  a  general  upper  bound  which  can  be 
used,  for  a  broad  class  of  generator  distributions  F^,  to  show  that  E[c(U)] 
is  finite  and  therefore  that  the  algorithm  achieves  expected  linear  storage. 
Second  we  derive  a  tighter  upper  bound  and  a  lower  bound  for  the  case  of 
independent  generators. 

All  our  bounds  will  be  based  on  the  total  probability  expansion  of 
E[c(U) ]  with  respect  to  the  possible  values  of  W: 

+  « 

E[c(U)]  »  T  E[c(U)|W-w]P[W-w]  .  (5.36)  - 

w-1 

We  already  know  how  to  compute  P[W  »w) ,  and  we  will  use  suitable  upper  and 
lower  bounds  for  E[c(U)|Waw]. 

A  first  upper  bound.  We  have  shown  in  Section  4  that  the  total  number  of  cuts 
in  processing  a  rectangle  with  w  points  is  bounded  by  f(w),  where  f  is  defined 
by  Eqs.  (4.1a)-(4.1c) .  Since  the  total  number  of  cuts  in  a  rectangle  is  the 
sum  of  the  cuts  associated  with  all  the  endpoints  within  the  rectangle,  and 
c(U)  has  the  same  statistical  properties  for  all  generators,  we  conclude  that 

E[c(U)|w-w]  <  f(w)/w  .  (5.37) 

Use  of  bound  (5.37)  in  Eq.  (5.36)  yields 

+  • 

E[c(U)]  ^  E  P[W«w]f(w)/w  . 
w*l 


(5.38) 


Discussion  of  convergence.  It  can  be  easily  shown  that  f(w)/w  *  O(logw), 

and  actually  that  f(w)/w  <  log-w.  Then,  considering  that  the  series 
+-  x 

E  l/(w(logw)  )  converges  if  X  >  1  and  diverges  if  \  <  1,  we  see  that  a 
w*2 

sufficient  condition  for  the  right-hand  side  of  (5.38)  to  converge  is  that 

P[W-w]  ■  O(l/(w(logw)2+*))  for  some  «  >  0.  The  intuitive  meaning  of  this 

condition  is  that  rectangles  with  many  points  should  not  be  too  likely. 

This  statement,  however,  holds  only  in  a  very  weak  sense.  For,  we  could  have 

E[W]  *  +*  ,  and  still  a  finite  E[c(U)]I  (One  instance  of  this  case  is 

3 

P[W-w]  -  9(l/w(logw)  ).)  We  have  seen  in  the  last  section  that  the  dependence 
of  P[W  ■  w]  on  the  distribution  of  the  generators  is  not  straightforward,  and 
this  makes  it  difficult  to  restate  the  convergence  condition  on  P[W  - w]  in 
terms  of  F^^.v). 

We  can  get  more  Insight  by  reasoning  in  terms  of  the  number  of  segments 
Q  defined  in  (5.33),  whose  p.m.f.  has  a  simple  expression.  We  can  write 
+• 

E[c(U)l-  E  P[Q-q]E[c(U)|Q»q] 
q»0 

+  •  +• 

-  E  P[U  €  slab(i)]  E  P.fQ  -  q]E[c(U)  |Q  -  q]  ,  (5.39) 

li|-l  q-0 

where  we  have  expanded  P[Q  ■  q]  according  to  the  total  probability  theorem, 
and  we  have  interchanged  the  order  of  summation.  The  last  operation  is 
allowed  since  we  deal  with  series  of  positive  terms.  Now,  if  Q  *  0,  ft 
contains  only  one  segment  and  therefore  there  are  no  cuts  (E[c(U) |Q  » 0]) . 

For  q  >  0,  the  average  number  of  cuts  E[c(U)|Qaq]  is  obviously  0(log  q) 
and  therefore  it  can  be  upper  bounded  by  kq,  where  k  >  0  is  a  suitable 
constant.  So,  after  using  Eq.  (5.35),  we  get 


(5.40 


£ 


r.' 


s 


+•  2  +  •  . 

E  PilQ-qlEtc^lQ-q]  <  k  pQi  E  (q  +  l)q(l  -  pQi)q  . 
q«o  q-i 

The  last  series  can  be  summed  In  closed  form  yielding 

+E  P.lQ-qlElcCUjlQ-q]  £  k  p*  -f-  -  r23*  •  (5.41) 

q-0  1  01  p^  P01 

Note  that  we  have  added  the  subscript  1  to  Pq,  to  stress  Its  dependence  on 
the  slab.  Substitution  of  bound  (5.41)  In  Eq.  (5.39)  gives 

E[c(U)]£2k  I  2-<lil+1Y-^  +  _J^\ 

1-1  'p0i  p0(-i)' 

+  •  9-i  +»  ,-i 

-kZr—  +  kEr—  .  (5.42) 

1-1  p01  1-1  p0(-l) 

A  sufficient  condition  for  convergence  of  the  last  term,  and  therefore  for 
finiteness  of  E[c(U)],  Is  that 

P()i  >  k*  2’1  1  (log  l)1+f  ,  1  Si  1  (5.43) 

for  some  k'  >  0,  and  some  <  >  0,  and  a  similar  condition  for  1  <  1. 

Condition  (5.43)  Is  not  very  restrictive  at  all.  It  Is  intuitive  that  a 
lower  bound  on  the  probabilities  of  spanning  segments  prevents  rectangles 
with  many  segments.  The  lower  bound  is  decreasing  with  |i|  because  slabs 
with  fewer  points  have  less  weight  in  the  contribution  to  the  number  of  cuts. 


46 


Independent  generators .  To  get  less  crude  bounds  than  the  one  expressed  by 
Eq.  (5.38),  we  need  to  further  specify  the  joint  distribution  of  the 
generators .  We  introduce 

Assumption  A3  (independence) .  U  and  V  are  statistically  independent,  with 
joint  distribution 

FyyCu.v)  -  FyOOFyCv)  -  u  v,  u,v  €  [0,1]  .  (5.44) 

The  hypothesis  of  independence  considerably  simplifies  the  analysis. 

Here  we  see  another  advantage  of  defining  generators;  in  fact,  as  already 
noted,  we  could  not  directly  assume  independence  together  with  indntical 
distribution  for  the  bottom  B  and  the  top  T  of  segment  s,  since  B  ^  T. 

Let  ft  be  a  rectangle  processed  by  the  algorithm  in  slab  [m,m' ],  not 
necessarily  a  principal  one.  In  any  case  ft  will  be  completely  below 
(m'  <  %)  or  completely  above  (m  ^  %)  the  median  m^  »  %.  The  median  M  of 
the  points  in  ft  divides  ft  into  two  parts:  we  call  Internal  the  one  closer  to 
mQ  ■  hi  and  external  the  other  one.  So  the  internal  part  is  the  one  above  M 
for  rectangles  below  m^,  and  the  ones  below  M  for  rectangles  above  m. 

Let  Zp...,zwbe  the  endpoints  inside  ft.  Without  loss  of  generality 
suppose  that  ft  is  above  mQ,  as  shown  in  Figure  5.8.  If  z^  €  ft  is  a 
generator  of  segment  (Zj,Zj)  ,  this  segment  is  cut  by  median  M  if  and  only  if 
Zj  <  M  and  z^  >  M,  or  Zj  >  m  and  Zj  <  M.  If  we  call  c(z,M)  the  number  of  cuts 
due  to  M  and  charged  to  z,  we  have: 

'  M  ,  z  >  M 

P[c(z;M)  - 1]  -  <  0  ,  z-M  (5.45) 

,  1-M  ,  z  <  M 


external 


Internal 


- 1  M 


Figure  5.8.  A  rectangle  ft  above  divided  in  internal  and  external  part 

by  median  M.  A  point  z  forms  a  segment  cut  by  M  if  the 
companion  point  z  is  on  the  other  side  of  the  median. 


In  general  we  have 


P[c(z;M)  «  1]  <  0 


,  z  external 
,  z  -  M 
,  z  internal 


(5.46 


where  pj  ^  p^,  and  Pj  +  p£  ■  1.  The  variable 

A  w 

D  ■  Z  c(z,;M)  (5.47 

j-1  J 

is  an  upper  bound  to  the  number  of  cuts  due  to  M,  since,  if  z  and  z  are 
both  in  ft,  the  cuts  of  segment  (z,z)  are  counted  twice  in  our  census  scheme. 
To  compute  E[D],  let  us  suppose  that  ft  contains  w  points,  n^  in  the  internal 
part  and  in  the  external  one  (.  »  n_  +  rt_  +  1).  With  this  notation 
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w 

E[D]  -  I  E[c(2  ;M>] 
j-1  J 


-  I  P[c(z.;M) 
j-1  J 


11  -  Vl  +  . 


(5.48) 


If  we  choose  the  median  M  such  that  n^  -  n^  -  (w-l)/2  when  w  is  odd,  and 
nj  -  ng  +1  -  w/2  when  w  is  even,  we  have  that 


E[DJ  -  < 


(w-l)/2  ,  w  odd 

w/2  -  1  +  pj,  w  even 


(5.49) 


and  in  general,  being  Pj  <  \  , 


w/2  -  1^  E[Dl  <  (w-l)/2  . 


(5.50) 


The  number  of  cuts  in  a  given  rectangle  ft  equals  the  number  of  cuts  due  to 
the  median  M  plus  the  number  of  cuts  occurring  in  the  two  regions  obtained 
by  cutting  ft  with  M.  In  the  worst  case  such  regions  will  consist  of  only 
one  rectangle,  while  in  general  spanning  segments  will  form  several 
rectangles  in  each  region.  Based  on  these  considerations  we  can 
recursively  define  an  upper  bound  g(w)  for  the  average  number  of  cuts 
occurring  in  a  rectangle  with  w  points. 


g(0)  -  0,  g(l)  -  0  ,  (5.51a) 

g(w)  -  (w-l)/2  +  g(n^)  -f  gOig)  ,  (5.51b) 

where,  as  we  have  said,  rij  -  L(w-1)/2J,  n^.  -  r(w-l)/2l  .  Using  the  bound 
g(w)/w  for  E[c(U)|w-w]  in  Eq.  (5.36)  we  get 
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+• 

E[c(U)]  <  E  P[W-w]g(w)/w  .  (5.52) 

w«l 

To  get  a  lower  bound  for  E[c(U)|w  aw]  we  count  only  the  cuts  due  to  the 
first  median  in  rectangles  of  principal  slabs.  We  already  have  a  lower 
bound  for  E[D],  but  since  D  accounts  twice  for  cuts  of  segments  with  both 
endpoints  in  the  same  rectangle,  we  have  to  subtract  this  contribution. 

Since  P[s  €  “  4”^i+^  for  slab  (i),  the  probability  of  segments  with 

both  endpoints  in  the  same  slab  is 

00 

2  E  4~(i+1)  .  1/6  .  (5.53) 

i-1 

So,  with  probability  1/6,  in  E[D]  we  charge  to  an  endpoint  %  cuts  more 
than  due,  and  we  have 

(w/2  -  l)/w  -  1/24  jS  E[c(U)  |w  *w]  , 

whence 

11+* 

E[c(U)]  i  |  ^  -  E  P[W-w]/w  . 

w*l 

5.7.  Independent  Generators:  numerical  Results  and  Simulations 

We  now  specialize  the  preceding  analysis  to  the  case  of  independent 
generators  in  order  to  get  numerical  results.  We  also  compare  such  results 
with  those  obtained  by  actually  running  the  algorithm  on  suitable  random 
inputs . 

In  the  sequel.  Assumptions  Bl  and  A3  are  used.  Assumption  A2  is 
satisfied  as  a  consequence. 


(5.54) 

(5.55) 


V 


First  we  plug  Eq.  (5.44)  Into  Eq.  (5.20),  and,  by  summing  the  resulting 
geometric  series,  we  get 

E[cn]  -  Z  2<m-m2)  -  13/6  .  (5.56 

0  m  €  n 

Then,  from  Eqs.  (5.23)  and  (5.24),  or  from  a  direct  argument,  we  can  see 
that,  for  slab  (+  1),  (for  brevity  we  use  the  symbol  L  defined  as 
l  4  2"(1+lil)):  we  have 


pQ  -  2(l-2X)/(4-51)  , 

(5.57 1 

Pl  -  2(l-i)/(4-5f)  , 

(5.571 

P2  ■  1/(4  -  5f)  . 

(5.57< 

Using  these  values  for  Pq,  p^,  p2  ve  can  compute  P[W*w],  according  to  the 
procedure  outlined  in  Section  5.5.  Then  we  proceed  to  the  numerical 
evaluation  of  bounds  (5.52)  and  (5.55)  obtaining 

0.18  -  E[c(U)]  £  0.45  .  (5.58 

Using  these  results  in  Eq.  (5.14)  we  get 

2.53SE[c]S  3.07  ,  (5.59 

and  correspondingly  Eq.  (5.11)  becomes 

E[s] 


5.53  £ 


<  6.07 


(5.60 
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The  algorithm  has  been  coded,  and  run  on  random  inputs.  The 
Independent  endpoints,  U  and  V,  have  been  generated  by  a  congruential  method 
[7].  Particular  care  has  been  taken  in  choosing  generators  of  the  same 
segment,  taking  them  far  apart  from  each  other  in  the  "random"  sequence  given 
by  the  congruential  method,  in  order  to  avoid  a  possible  correlation. 

The  results  of  the  sisulation  completely  agrees  with  the  theoretical 
analysis.  For  n  ranging  from  300  to  3000,  S/n  has  been  always  very  close 
to  5.7,  so  that  experiments  suggest  S  **  5.7  n. 

Another  interesting  result  of  simulation  is  that,  on  the  average,  the 
depth  of  the  tree  is  about  2  log^n,  actually  with  a  very  small  variance. 

5.8.  Average  Time 

We  have  seen  in  Section  5.2  that  the  average  running  time  of  SEARCHTREE 
is  linear  if  the  average  number  of  calls  in  which  a  given  endpoint  is 
processed,  is  a  (finite)  constant. 

Let  H  denote  the  number  of  calls  in  which  the  generator  U  is  involved. 

If  U  €  8lab(l),  then  it  will  be  processed  ■  |i|  +1  times  during  the 
formation  of  principal  slabs,  and  H2  times  during  the  processing  of  the 
interior  of  slab(i).  So  H  -  +  H2  and  E[H]  -  E[HX]  +  E[H2] .  E[HX1  is 

finite  and  can  actually  be  exactly  computed  as 

+• 

E[H,]  -  E  E[H. |u  €slab(i)]P[U  €  slab(i)] 

|i|-i 

+  •  . 

-  E  (i+l)2"1  -  3  (5.61) 

i-1 

To  show  that  E[H^]  is  in  general  finite  we  use  the  expansion 


(5.62) 


E[H9]  -  Z  E[hJw«w]P[W«w]  . 
w-1 

Now,  E[H2|w»w]  “  O(log  w),  since  at  any  call  involving  the  same  point  the 
number  of  points  processed  is  at  least  halved.  Therefore  we  can  apply  to  the 
series  (5.62)  considerations  similar  to  those  made  in  Section  5.6  about 
Eq.  (5.38). 

In  conclusion,  for  a  wide  class  of  generator  distribution.  Including  the 
case  of  independent  generators,  the  expected  time  of  SEARCHTREE  is  linear 
in  the  number  of  segments. 


6 .  CONCLUSIONS 


In  this  thesis  we  have  defined  and  analyzed  a  new  algorithm  for  the 
adjacency  map,  using  a  technique  that  can  be  easily  extended  to  solve  the 
point  location  problem  in  a  planar  subdivision  induced  by  an  embedded 
straight-line  planar  graph. 

An  advantage  of  this  technique  lies  in  the  simplicity  of  the  basic 
idea:  the  extension  of  binary  search  to  planar  structures.  However,  dealing 
with  two  dimensions  requires  some  care  in  the  Implementation  of  the  binary 
search  to  keep  both  the  storage  and  the  search  time  bounded. 

From  a  practical  point  of  view,  our  algorithm  is  quite  attractive, 
since  it  builds  in  time  O(nlogn)  a  data  structure  that  can  be  stored  in 
space  O(nlogn)  and  searched  in  time  O(logn).  Moreover  all  the  constants 
Involved  are  small. 

Theoretically  though,  we  know  that  0(n)  space  is  achievable,  and 
therefore  the  algorithm  is  not  asymptotically  optimal  in  the  worst  case. 
However,  while  it  is  possible  to  find  cases  in  which  S(n)  *  6(nlogn),  (the 
reader  may  try  with  the  set  {s^  ■  (-j, j)| j  ■  1, . . .,n}),  such  cases  require 
a  strong  correlation  in  the  position  of  all  the  segments.  This  suggests 
that  if  the  segments  are  random,  for  instance  Independent  of  each  other, 
the  expected  value  of  S(n)  should  be  linear  in  n. 

The  analysis  developed  in  Section  5  confirms  the  foregoing  conjecture 
and  also  allows  us  to  estimate  the  constant  k  in  E[S(n)]  ■  kn,  when  the 
statistical  description  of  the  segments  is  given.  In  the  case  of  segments 
with  independent  endpoints,  we  have  seen  that  k**  6.  Comparing  this 
result  with  the  lower  bound  discussed  in  Section  4,  that  implies  k  *  3, 
we  may  conclude  that  the  storage  performance  of  the  algorithm  is  quite  good. 


The  preprocessing  time  is  also  satisfactory,  since,  after  presorting 
the  procedure  SEARCHTREE  runs  in  average  linear  time.  Finally  the  search  ' 

time  is  already  good  (<  3  logn  +  7)  in  the  worst  case,  and  simulation 
suggests  it  is  slightly  better  ^  2  logn)  on  the  average. 

Beyond  the  details  of  the  analysis  ve  may  like  to  capture,  on  an  intuitive  \ 

c 

basis,  the  essential  features  of  the  algorithm  that  make  its  average  time  and 
space  be  linear.  We  can  see  the  action  developing  as  follows.  First  the  j 

region  to  be  searched  is  partitioned  in  O(logn)  strips  of  plane,  the 
principal  slabs  in  the  infinite  model,  and  a  point  is  located  in  a  slab. 

Then  the  search  proceeds  in  the  interior  of  the  slab.  A  segment  must  be  ; 

represented  in  the  search  structure  of  each  of  the  slabs  that  intersects,  i 

but  this  can  be  done  at  the  expense  of  a  small  amount  of  extra  storage,  since  . . 
on  the  average  a  segment  intersects  less  than  4  slabs.  Each  principal  slab  , 
is  in  turn  partitioned  in  rectangles  by  the  spanning  segments.  Comparison 
against  spanning  segments  allows  point  location  in  a  rectangle.  At  this 
point  we  only  need  a  search  structure  for  each  rectangle,  since  different 
rectangles  do  not  interfere  with  each  other.  The  size  of  such  structures 
depends  on  the  number  of  points  in  the  rectangle.  The  key  point  is  now  that 
when  the  size  n  of  the  problem  increases,  the  number  of  rectangles  increases 
proportionally,  but  the  distribution  of  the  size  of  the  rectnagles  does  not 
change.  Therefore  the  average  time  to  build,  and  the  average  space  to 
8 tore,  the  related  search- tree  are  constant.  Thus  globally,  the  average 
complexity  is  linear.  The  constant  of  proportionality  is  of  course  related 
to  the  distribution  of  the  rectangle  size,  and  this  in  turn  to  the  frequency 
of  spanning  segments.  When  the  endpoints  are  independent  there  are  many 
spanning  segments,  and  this  accounts  for  the  fact  that  the  constants  are  J 


smell  in  this  case. 


As  we  have  seen,  Che  probabilistic  analysis  of  Che  adjacency  map  has 
given  considerable  insight  on  Che  main  features  of  Che  point-location 
technique  used.  The  next  step  should  be  to  extend  the  analysis  to  the 
general  case  of  point  location  in  planar  graphs.  If,  as  we  conjecture, 
the  complexity  of  the  algorithm  is  the  same  in  the  general  case,  the 
proposed  technique  can  be  considered  a  basic  tool  in  computational  geometry 
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